Dimensional data modelling is a technique for organizing data in a database so that it can be easily analyzed and queried. It is often used in business intelligence and data warehousing applications to represent complex relationships between data in a simple and intuitive way.
Here are some common interview questions that you may encounter when interviewing for a role that involves dimensional data modelling
In this article, we will cover the following:
Most frequently asked Dimensional Modeling Interview Questions |
Ans: Dimensional modelling is often used in Data warehousing. In simpler words, it is a rational or consistent design technique used to build a data warehouse. DM uses facts and dimensions of a warehouse for its design. A snow and star flake schema represents data modelling.
Ans) Consider any system where people use some kind of resources and compete for them. The non-computer examples for preemptive scheduling the traffic on the single-lane road if there is an emergency or there is an ambulance on the road the other vehicles give the path to the vehicles that are in need. An example of preemptive scheduling is people standing in a queue for tickets.
Ans:
Star Schema:
Well in star schema you just enter your desired facts and all the primary keys of your dimensional tables in the Fact table. And fact tables primary is the union of its all dimension table key. In a star schema, dimensional tables are usually not in BCNF form.
SnowFlake:
It's almost like star schema but in this, our dimension tables are in 3rd NF, so more dimensions tables. And these dimension tables are linked by primary and foreign key relations.
Ans: Data sparsity is a term used for how much data we have for a particular dimension/entity of the model.
It affects aggregation depending on how deep the combination of members of the sparse dimension makes up. If the combination is a lot and those combinations do not have any factual data then creating space to store those aggregations will be a waste as a result, the database will become huge.
Ans: In Datastage server jobs, can we use a sequential file stage for a lookup instead of a hashed file stage. If yes, then what's the advantage of a Hashed File stage over a sequential file stage
Search is faster in hash files as you can directly get the address of the record directly by the hash algorithm as records are stored like that but in the case of sequential files, u must compare all the records.
Want to acquire industry skills and gain complete knowledge of Dimensional Data Modeling? Enroll in Instructor-Led live Dimensional Data Modeling Training to become Job Ready! |
Ans: Denormalization is used when there is a lot of tables involved in retrieving data. Denormalization is done in dimensional modelling used to construct a data warehouse. This is not usually done for databases of transactional systems.
Ans: Data models are tools used in the analysis to describe the data requirements and assumptions in the system from a top-down perspective. They also set the stage for the design of databases later on in the SDLC.
There are three basic elements in ER models:
Ans: An entity is in the third normal form if it is in the second normal form and all of its attributes are not transitively dependent on the primary key. Transitive dependence means that descriptor key attributes depend not only on the whole primary key but also on other descriptor key attributes that, in turn, depend on the primary key. In SQL terms, the third normal form means that no column within a table is dependent on a descriptor column that, in turn, depends on the primary key.
For 3NF, first, the table must be in 2NF, plus, we want to make sure that the non-key fields are dependent upon ONLY the PK, and not other non-key fields for its existence. This is very similar to 2NF, except that now you are comparing the non-key fields to OTHER non-key fields. After all, we know that the relationship with the PK is good because we established that in 2NF.
Ans: Recursive relationships are an interesting and more complex concept than the relationships you have seen in the previous chapters, such as one-to-one, one-to-many, and many-to-many. A recursive relationship occurs when there is a relationship between an entity and itself.
For example, a one-to-many recursive relationship occurs when an employee is the manager of other employees. The employee entity is related to itself, and there is a one-to-many relationship between one employee (the manager) and many other employees (the people who report to the manager). Because of the more complex nature of these relationships, we will need slightly more complex methods of mapping them to a schema and displaying them in a stylesheet.
Ans: Using a name as the primary key violates the principle of stability. The social security number might be a valid choice, but a foreign employee might not have a social security number. This is a case where a derived, rather than a natural, primary key is appropriate. A derived key is an artificial key that you create. A natural key is one that is already part of the database.
Check out our tutorial on Data Modeling Tutorial! |
Ans)An entity is in the second normal form if all of its attributes depend on the whole (primary) key. In relational terms, every column in a table must be functionally dependent on the whole primary key of that table. Functional dependency indicates that a link exists between the values in two different columns.
If the value of an attribute depends on a column, the value of the attribute must change if the value in the column changes. The attribute is a function of the column. The following explanations make this more specific:
Ans: In general all organization databases are normalized to 3nf in order to remove redundancy and efficient access. A database can also be created without normalization. Hence it is not mandatory that a database should be in 3nf.
Ans: Conceptual data model includes all major entities and relationships and does not contain much detailed level of information about attributes and is often used in the INITIAL PLANNING PHASE.
The conceptual data model is created by gathering business requirements from various sources like business documents, discussions with functional teams, business analysts, smart management experts, and end-users who do the reporting on the database. Data modellers create a conceptual data model and forward that model to the functional team for their review.
Conceptual Data Model - Highlights
Ans: The development of a common consistent view and understanding of data elements and their relationships across the enterprise is referred to as Enterprise Data Modeling. This type of data modelling provides access to information scattered throughout an enterprise under the control of different divisions or departments with different databases and data models.
Enterprise Data Modeling is sometimes called as a global business model and the entire information about the enterprise would be captured in the form of entities.
Data Model Highlights
Ans: When a data modeller works with the client, his title may be a logical data modeller or a physical data modeller or a combination of both. A logical data modeller designs the data model to suit business requirements, creates and maintains the lookup data, compare the versions of the data model, maintains a changelog, generates reports from the data model, and whereas a physical data modeller has to know about the source and target databases properties.
A physical data modeller should know the technical know-how to create data models from existing databases and to tune the data models with referential integrity, alternate keys, indexes, and how to match indexes to SQL code. It would be good if the physical data modeler knows about replication, clustering, and so on.
The differences between a logical data model and a physical data model are shown below.
Logica Data Modeling | Physical Data Modeling |
Represents business information and defines business rules | Represents the physical implementation of the model in a database |
Entity | Table |
Attribute | Attribute |
Primary Key | Primary Key Constraint |
Alternate Key | Unique Constraint or Unique Index |
Inversion Key Entry | Non-Unique Index |
Rule | Check Constraint, Default Value |
Relationship | Foreign Key |
Definition | Comment |
Ans: Relational Data Modeling is used in OLTP systems which are transaction-oriented and Dimensional Data Modeling is used in OLAP systems which are analytical-based. In a data warehouse environment, the staging area is designed on OLTP concepts, since data has to be normalized, cleansed and profiled before being loaded into a data warehouse or data mart. In an OLTP environment, lookups are stored as independent tables in detail whereas these independent tables are merged as a single dimension in an OLAP environment like a data warehouse.
Relational Data Modeling | Dimensional Data Modeling |
Data is stored in RDBMS | Data is stored in RDBMS or Multidimensional databases |
Tables are units of storage | Cubes are units of storage |
Data is normalized and used for OLTP. Optimized for OLTP processing | Data is denormalized and used in the data warehouse and data mart. Optimized for OLAP |
Several tables and chains of relationships among them | Few tables and fact tables are connected to dimensional tables |
Volatile(several updates) and time-variant | Non volatile and time-invariant |
The detailed level of transactional data | Summary of bulky transactional data |
(Aggregates and Measures) used in business decisions
Ans:
Standardization Needs | Modeling data:
Several data modelers may work on the different subject areas of a data model and all data modelers should use the same naming convention, writing definitions and business rules.
Nowadays, business-to-business transactions(B2B) are quite common, and standardization helps in understanding the business in a better way. Inconsistency across column names and definitions would create chaos across the business.
For example: when a data warehouse is designed, it may get data from several source systems and each source may have its own names, data types etc. These anomalies can be eliminated if proper standardization is maintained across the organization.
Table Names Standardization:
Giving a full name to the tables will give an idea about data what it is about. Generally, do not abbreviate the table names; however, this may differ according to the organization’s standards. If the table name length exceeds the database standards, then try to abbreviate the table names. Some general guidelines are listed below that may be used as a prefix or suffix for the table.
Examples:
Lookup – LKP - Used for Code, Type tables by which a fact table can be directly accessed.
e.g. Credit Card Type Lookup – CREDIT_CARD_TYPE_LKP
Fact – FCT - Used for transaction tables:
e.g. Credit Card Fact - CREDIT_CARD_FCT
Cross Reference - XREF – Tables that resolve many to many relationships.
e.g. Credit Card Member XREF – CREDIT_CARD_MEMBER_XREF
History – HIST - Tables the store's history.
e.g. Credit Card Retired History – CREDIT_CARD_RETIRED_HIST
Statistics – STAT - Tables that store statistical information.
e.g. Credit Card Web Statistics – CREDIT_CARD_WEB_STAT
Column Names Standardization:
Some general guidelines are listed below that may be used as a prefix or suffix for the column.
Examples:
Key – Key System generated surrogate key.
e.g. Credit Card Key – CRDT_CARD_KEY
Identifier – ID - Character column that is used as an identifier.
e.g. Credit Card Identifier – CRDT_CARD_ID
Code – CD - Numeric or alphanumeric column that is used as an identifying attribute.
e.g. State Code – ST_CD
Description – DESC - Description for a code, identifier or key.
e.g. State Description – ST_DESC
Indicator – IND – to denote indicator columns.
e.g. Gender Indicator – GNDR_IND
Database Parameters Standardization:
Some general guidelines are listed below that may be used for other physical parameters.
Examples:
Index – Index – IDX – for index names.
e.g. Credit Card Fact IDX 01 – CRDT_CARD_FCT_IDX01
Primary Key – PK – for Primary key constraint names.
e.g. CREDIT Card Fact PK01- CRDT-CARD_FCT_PK01
Alternate Keys – AK – for Alternate key names.
e.g. Credit Card Fact AK01 – CRDT_CARD_FCT_AK01
Foreign Keys – FK – for Foreign key constraint names.
e.g. Credit Card Fact FK01 – CRDT_CARD_FCT_FK01
Ans: These are the general guidelines to create a standard data model and in real-time, a data model may not be created in the same sequential manner as shown below. Based on the enterprise’s requirements, some of the steps may be excluded or included in addition to these.
Sometimes, the data modeler may be asked to develop a data model based on the existing database. In that situation, the data modeler has to reverse engineer the database and create a data model.
For each release (version of the data model), try to compare the present version with the previous version of the data model. Similarly, try to compare the data model with the database to find out the differences.
Create a change log document for differences between the current version and the previous version of the data model.
Ans:
Ans: Conceptual data model includes all major entities and relationships and does not contain much detailed level of information about attributes and is often used in the INITIAL PLANNING PHASE.
The conceptual data model is created by gathering business requirements from various sources like business documents, discussions with functional teams, business analysts, smart management experts and end users who do the reporting on the database. Data modellers create conceptual data models and forward that model to the functional team for their review.
Conceptual Data Model - Highlights
In CDM discussion, technical as well as non-technical team projects their ideas for building a sound logical data model
Ans: The development of a common consistent view and understanding of data elements and their relationships across the enterprise is referred to as Enterprise Data Modeling. This type of data modelling provides access to information scattered throughout an enterprise under the control of different divisions or departments with different databases and data models.
Enterprise Data Modeling is sometimes called a global business model and the entire information about the enterprise would be captured in the form of entities.
Data Model Highlights
When an enterprise logical data model is transformed into a physical data model, supertypes and subtypes may not be as is. i.e. the logical and physical structure of super types and sub-types may be entirely different. A data modeller has to change that according to the physical and reporting requirement.
When an enterprise logical data model is transformed into a physical data model, the length of table names, column names etc may exceed the maximum number of characters allowed by the database. So a data modeller has to manually edit that and change the physical names according to the database or organization’s standards.
One of the important things to note is the standardization of the data model. Since the same attribute may be present in several entities, the attribute names and data types should be standardized and a conformed dimension should be used to connect to the same attribute present in several tables.
A standard Abbreviation document is a must so that all data structure names would be consistent across the data model.
Dimensional Data Modeling :
It is a modeling technique used in data warehousing systems. It is different from ER modelling technique used in OLTP systems. In Dimensional modelling, a model of tables is combined together with the aim of optimized query performance in Decision Support systems in relational databases.
Before further going into details of dimensional data modeling we should be aware of key concepts in Dimensional Data Modeling
Ans: A star flake schema is a combination of a star schema and a snowflake schema. ... The hierarchies of star schemas are denormalized, while the hierarchies of snowflake schemas are normalized. Starflake schemas are normalized to remove any redundancies in the dimensions.
Ans: In computing, a snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions. [citation needed].
Ans: Data sparsity is the term used for how much data we have for a particular dimension/entity of the model.
It affects aggregation depending on how deep the combination of members of the sparse dimension makes up. If the combination is a lot and those combinations do not have any factual data then creating space to store those aggregations will be a waste as a result, the database will become huge.
Ans: A conceptual model is a representation of a system, made of the composition of concepts that are used to help people know, understand, or simulate a subject the model represents. Some models are physical objects; for example, a toy model may be assembled and may be made to work like the object it represents.
Ans: Data marts are generally designed for a single subject area. An organization may have data pertaining to different departments like Finance, HR, Marketing etc. stored in a data warehouse and each department may have separate data marts. These data marts can be built on top of the data warehouse.
Ans: ER model or entity-relationship model is a particular methodology of data modelling wherein the goal of modelling is to normalize the data by reducing redundancy. This is different from dimensional modeling where the main goal is to improve the data retrieval mechanism.
Ans)Dimensional model consists of dimension and fact tables. Fact tables store different transactional measurements and the foreign keys from dimension tables that qualify the data. The goal of the Dimensional model is not to achieve a high degree of normalization but to facilitate easy and faster data retrieval.
Ralph Kimball is one of the strongest proponents of this very popular data modelling technique which is often used in many enterprise-level data warehouses.
If you want to read a quick and simple guide on dimensional modelling, please check our Guide to dimensional modelling.
Ans: A dimension is something that qualifies as a quantity (measure).
For example: consider this: If I just say… “20kg”, it does not mean anything. But if I say, "20kg of Rice (Product) is sold to Ramesh (customer) on 5th April (date)", then that gives meaningful sense. These products, customers,s and dates are some dimensions that qualified the measure - 20kg.
Dimensions are mutually independent. Technically speaking, a dimension is a data element that categorizes each item in a data set into non-overlapping regions.
Ans: A fact is something that is quantifiable (Or measurable). Facts are typically (but not always) numerical values that can be aggregated.
Ans:
Non-additive Measures:
Non-additive measures are those which can not be used inside any numeric aggregation function (e.g. SUM(), AVG() etc.). One example of a non-additive fact is any kind of ratio or percentage. Example, 5% profit margin, revenue to asset ratio, etc. A non-numerical data can also be a non-additive measure when that data is stored in fact tables, e.g. some kind of varchar flags in the fact table.
Semi-Additive Measures:
semi-additive measures are those where only a subset of the aggregation function can be applied. Let’s say account balance. A sum() function on balance does not give a useful result but max() or min() balance might be useful. Consider the price rate or currency rate. A sum is meaningless on rate; however, the average function might be useful.
Additive Measures:
Additive measures can be used with any aggregation function like Sum(), Avg(), etc. An example is Sales Quantity etc.
At this point, I will request you to pause and take some time to read this article on "Classifying data for successful modelling". This article helps you to understand the differences between dimensional data/ factual data etc. from a fundamental perspective
Ans: A junk dimension is a grouping of typically low-cardinality attributes (flags, indicators, etc.) so that those can be removed from other tables and can be junked into an abstract dimension table.
These junk dimension attributes might not be related. The only purpose of this table is to store all the combinations of the dimensional attributes which you could not fit into the different dimension tables otherwise. Junk dimensions are often used to implement Rapidly Changing Dimensions in the data warehouse.
Ans: In practice, a numerical attribute is enforced by a primary key which is called a surrogate key. This key is a replacement for natural keys, instead of having a primary key or composite primary keys, the data modelers will create this key, which is very useful in identifying a record, creating SQL queries software, and good performance.
Ans: Forward Engineering is a process by which Data Definition Language(DDL) scripts are generated from the data model. Data modeling tools have some options to create DDL scripts by coupling or connecting with several databases. By using these scripts, databases can be created. Reverse Engineering is a process used for creating data models from databases or scripts. Data modeling tools have some options to connect with the database through which we can reverse engineer a database into a data model.
Ans: Recursive relationships are an interesting and more complex concept than the relationships you have seen in the previous chapters, such as one-to-one, one-to-many, and many-to-many. A recursive relationship occurs when there is a relationship between an entity and itself.
For example, a one-to-many recursive relationship occurs when an employee is the 9 managers of other employees. The employee entity is related to itself, and there is a one-to-many relationship between one employee (the manager) and many other employees (the people who report to the manager). Because of the more complex nature of these relationships, we will need slightly more complex methods of mapping them to a schema and displaying them in a stylesheet.
You liked the article?
Like: 0
Vote for difficulty
Current difficulty (Avg): Medium
TekSlate is the best online training provider in delivering world-class IT skills to individuals and corporates from all parts of the globe. We are proven experts in accumulating every need of an IT skills upgrade aspirant and have delivered excellent services. We aim to bring you all the essentials to learn and master new technologies in the market with our articles, blogs, and videos. Build your career success with us, enhancing most in-demand skills in the market.